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Claim Listing 

Amendments to the Claims 

The Claim Listing below is the original claim listing. No new matter has been added. 

1 . (Original) A method for determining whether records are similar in a database containing 
both structured and unstructured, free-text data, the method comprising the steps of: 

accessing two of the records from the database for evaluation; and 

evaluating a match between the two records as a weighted match between each of 
a plurality of available fields, such that a matching process is selected as appropriate from 
among a group of matching processes including strict Boolean, ordinal, and vector-based 
matching processes, wherein: 

when a strict Boolean matching process is selected, applying a match 

function as an exact match test; 

when an ordinal matching process is selected, applying a match function 

that makes use of information concerning the size and ordering of the data 

domain; and 

when a vector-based matching process is selected applying a match 
function that uses a vector space frequency test. 

2. (Original) The method of claim 1 wherein the step of evaluating a match between the two 
records comprises applying the matching process to determine a match score for two 
corresponding fields of the plurality of available fields, the two corresponding fields 
selected from corresponding locations in each of the two records. 

3. (Original) The method of claim 1 wherein the step of evaluating a match between the two 
records comprises selecting the matching process based on a common data type shared by 
both of two fields of the plurality of available fields accessed in the two records. 

4. (Original) The method of claim 3 wherein when a Boolean matching process is selected, 
the data type of both of the two fields specifies nominal data. 

5. (Original) The method of claim 3 wherein when an ordinal matching process is selected, 
the data type of both of the two fields specifies data capable of being ordered. 
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6. (Original) The method of claim 3 wherein, when a vector-based matching process is 
selected, the data type of both of the two fields specifies text data. 

7. (Original) The method of claim 1 wherein the step of evaluating the match between the 
two records comprises calculating a similarity score between the two records, as follows: 

sim(record/, record^) = wj *match(ai/,ay) + W2*match(a2/,a2y) + . . . 
w n *match(a m -,a n7 ) 

wherein sim is a similarity function that determines the similarity score for 
the two records; 

record/ is a first record of the two records and is identified in the database 
by an iterator i; 

record/ is a second record of the two records and is identified in the 

database by an iterator j\ 

iterator n identifies a field position for a given field a m - in the record/ and 

a corresponding field position for a given field a n y in the record^; 
match indicates the match function; and 

a symbol w n indicates a predefined weight for each result of each match 
function. 

8. (Original) The method of claim 1 wherein the database is a relational database, the 
records are tuples, and the fields are attributes. 

9. (Original) A data processing system for determining whether records are similar in a 
database containing both structured and unstructured, free-text data, the data processing 
system comprising: 

a communications interface for communicating with the database; and 
a processor coupled to the communications interface, the processor hosting and 
executing a data evaluation application that is configured to: 

access two of the records from the database for evaluation; and 
evaluate a match between the two records as a weighted match between 
each of a plurality of available fields, such that a matching process is selected as 
appropriate from among a group of matching processes including strict Boolean, 
ordinal, and vector-based matching processes, wherein: 
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when a strict Boolean matching process is selected, apply a match 
function as an exact match test; 

when an ordinal matching process is selected, apply a match 
function that makes use of information concerning the size and ordering of 
the data domain; and 

when a vector-based matching process is selected, apply a match 
function that uses a vector space frequency test. 

10. (Original) The data processing system of claim 9 wherein the data evaluation application 
is configured to apply the matching process to determine a match score for two 
corresponding fields of the plurality of available fields, the two corresponding fields 
selected from corresponding locations in each of the two records. 

1 1 . (Original) The data processing system of claim 9 wherein the data evaluation application 
is configured to select the matching process based on a common data type shared by both 
of two fields of the plurality of available fields accessed in the two records. 

12. (Original) The data processing system of claim 1 1 wherein when the data evaluation 
application selects a Boolean matching process, the data type of both of the two fields 
specifies nominal data. 

13. (Original) The data processing system of claim 1 1 wherein when the data evaluation 
application selects an ordinal matching process, the data type of both of the two fields 
specifies data capable of being ordered. 

14. (Original) The data processing system of claim 1 1 wherein, when the data evaluation 
application selects a vector-based matching process, the data type of both of the two 
fields specifies text data. 

15. (Original) The data processing system of claim 9 wherein the data evaluation application 

is configured to calculate a similarity score between the two records, as follows: 
sim(record/, record^) = wi*match(ai;,aiy) + W2*match(a2/,a2y) + . . . 
w n *match(a w /,a W7 ) 

wherein sim is a similarity function that determines the similarity score for 
the two records; 
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record; is a first record of the two records and is identified in the database 
by an iterator /; 

record^ is a second record of the two records and is identified in the 

database by an iterator j\ 

iterator n identifies a field position for a given field a m - in the record/ and 

a corresponding field position for a given field a^- in the record^; 
match indicates the match function; and 

a symbol w w indicates a predefined weight for each result of each match 
function. 

16. (Original) The data processing system of claim 9 wherein the database is a 
relational database, the records are tuples, and the fields are attributes. 



